-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relu merge optimizer pass #586
base: main
Are you sure you want to change the base?
Conversation
… from their own Layer class
…e the relu layer class is not simply called Activation anymore. Need to fix
… layer index error
Hi @oliviaweng thanks a lot for the contribution, it looks great! It seems some tests are failing: https://gitlab.cern.ch/fastmachinelearning/hls4ml/-/pipelines/4158655 I think most of the failures are relatively easy to fix, e.g. |
def match(self, node): | ||
supported_layers = (Dense, Conv2D, Conv2DBatchnorm) | ||
|
||
is_match = issubclass(node.get_input_node().__class__, supported_layers) | ||
# ReLU layers are of class Activation | ||
is_match = is_match and issubclass(node.__class__, Activation) | ||
return is_match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. For the missing CONFIG_T::out_t
error, it looks like this match()
function is too generous. Since all activation layers are of subclass Activation
, any Dense/Conv2D layer that is followed by any activation function returns True, which is wrong. I'll look into tightening this up.
A question on the approach itself: This will only work for ReLU and will require duplicate overrides for every possible combination of activation, layer and HLS implementation if we want to expand it. Instead of extending the kernel of the matrix-vector multiplication to tack on ReLU computation and then create duplicate function calls for that, why don't we introduce a new operation that is nop by default that sits at the end of the dense function? Basically the where the This probably sounds unclear, so if you have 15 minutes we can chat over zoom. |
Should we discuss @vloncar's suggestion? It seems like there is quite a bit of interest in this PR so it will be good to get it in. |
It was discussed offline and we converged on the proper approach to this. |
Any more news on this? |
@abijithYayavaram is currently building a more generic version. We aim to push some updates within the next several weeks. |
What is the plan for this in general? Is this where we'll attack with the new code generation framework? |
I believe we should revisit this once we have a better way of generating functions in which activations would be merged. There's little gain in general if FIFO depth optimization is applied (and none for io_parallel). |
Description
We introduce an hls4ml optimizer pass for merging the ReLU layer into the Dense/Conv2D layers when ReLU immediately follows them---a frequently encountered pattern in neural networks (NNs). NNs in hls4ml are spatially laid out using dataflow stages to implement each layer, which are linked together by FIFOs. These FIFOs can cost BRAMs, LUTs, and/or FFs. By default in hls4ml, each ReLU is implemented as its own dataflow stage. Because each additional dataflow stage costs extra logic and FIFOs, we reduce the resource utilization by merging the ReLU activation function into the layer preceding it. Although the layers with the newly merged ReLU functionality use more logic than before, there is still a net decrease in resources. This optimization was introduced in hls4ml's MLPerf TinyML Benchmark 2022 submission and written up in this paper. Resource reductions introduced by this optimization are reported in the paper.
Type of change
This optimization pass was first mentioned in the MLPerf TinyML PR #503.
Tests
This repo contains two test models (a fully-connected NN and a CNN) that can be trained on MNIST, converted into Vivado HLS, and synthesized using Vivado HLS 2020.1.
Checklist